Effect of Missing Data Types and Imputation Methods on Supervised Classifiers: An Evaluation Study

نویسندگان

چکیده

Data completeness is one of the most common challenges that hinder performance data analytics platforms. Different studies have assessed effect missing values on different classification models based a single evaluation metric, namely, accuracy. However, accuracy its own misleading measure classifier because it does not consider unbalanced datasets. This paper presents an experimental study assesses incomplete datasets five models. The analysis was conducted with ratios in six vary size, type, and balance. Moreover, for unbiased analysis, classifiers measured using three metrics, Matthews correlation coefficient (MCC), F1-score, results show sensitivity supervised to differs according set factors. significant factor pattern ratio, followed by imputation method, then balance dataset. when are due Missing Completely At Random (MCAR) less than their Not (MNAR) pattern. Furthermore, MCC as better reflects variation data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

ÐMissing data are often encountered in data sets used to construct effort prediction models. Thus far, the common practice has been to ignore observations with missing data. This may result in biased prediction models. In this paper, we evaluate four missing data techniques (MDTs) in the context of software cost modeling: listwise deletion (LD), mean imputation (MI), similar response pattern im...

متن کامل

Missing Data Imputation for Supervised Learning

This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on ...

متن کامل

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

Performance evaluation of different estimation methods for missing rainfall data

There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Big data and cognitive computing

سال: 2023

ISSN: ['2504-2289']

DOI: https://doi.org/10.3390/bdcc7010055